Search results for "Distributed memory"
showing 10 items of 13 documents
Parallel Schwarz methods for convection-dominated semilinear diffusion problems
2002
AbstractParallel two-level Schwarz methods are proposed for the numerical solution of convection-diffusion problems, with the emphasis on convection-dominated problems. Two variants of the methodology are investigated. They differ from each other by the type of boundary conditions (Dirichlet- or Neumann-type) posed on a part of the second-level subdomain interfaces. Convergence properties of the two-level Schwarz methods are experimentally compared with those of a variant of the standard multi-domain Schwarz alternating method. Numerical experiments performed on a distributed memory multiprocessor computer illustrate parallel efficiency of the methods.
The differences between distributed shared memory caching and proxy caching
2000
The authors discuss the similarities in caching between the extensively studied distributed shared memory systems and the emerging proxy systems. They believe that several of the techniques used in distributed shared memory systems can be adapted and applied to proxy systems.
A Low Cost Solution for 2D Memory Access
2006
Many of the new coding tools in the H.264/AVC video coding standard are based on 2D processing resulting in row-wise and column-wise memory accesses starting from arbitrary memory locations. This paper proposes a low cost solution for efficient realization of these 2D block memory accesses on sub-word parallel processors. It is based on the use of simple register-based data permutation networks placed between the processor and memory. The data rearrangement capabilities of the networks can further be extended with more complex control schemes. With the proposed control schemes, the networks enable row and column accesses from arbitrary memory locations for blocks of data while maintaining f…
Analyzing the Energy Efficiency of the Memory Subsystem in Multicore Processors
2014
In this paper we analyze the energy overhead incurred when operating with data stored in different levels of the memory subsystem (cache levels and DDR chips) of current multicore architectures. Our approach builds upon servet, a portable framework for the memory characterization of multicore processors, extending this suite with a power-related test that, when applied to a platform equipped with a power measurement mechanism, provides information on the efficiency of memory energy usage. As additional contributions, i) we provide a complete experimental study of the impact that the CPU performance states (also known as P-states) exert on the memory energy efficiency of a collection of rece…
PenRed: An extensible and parallel Monte-Carlo framework for radiation transport based on PENELOPE
2021
Monte Carlo methods provide detailed and accurate results for radiation transport simulations. Unfortunately, the high computational cost of these methods limits its usage in real-time applications. Moreover, existing computer codes do not provide a methodology for adapting these kind of simulations to specific problems without advanced knowledge of the corresponding code system, and this restricts their applicability. To help solve these current limitations, we present PenRed, a general-purpose, stand-alone, extensible and modular framework code based on PENELOPE for parallel Monte Carlo simulations of electron-photon transport through matter. It has been implemented in C++ programming lan…
Comparison of parallel implementation of some multi-level Schwarz methods for singularly perturbed parabolic problems
1999
Abstract Parallel multi-level algorithms combining a time discretization and an overlapping domain decomposition technique are applied to the numerical solution of singularly perturbed parabolic problems. Two methods based on the Schwarz alternating procedure are considered: a two-level method with auxiliary “correcting” subproblems as well as a three-level method with auxiliary “predicting” and “correcting” subproblems. Moreover, modifications of the methods using time extrapolation on subdomain interfaces are investigated. The emphasis is given to the description of the algorithms as well as their computer realization on a distributed memory multiprocessor computer. Numerical experiments …
Moving Learning Machine Towards Fast Real-Time Applications: A High-Speed FPGA-based Implementation of the OS-ELM Training Algorithm
2018
Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neur…
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems
2016
This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordJorge González-Domínguez, Yongchao Liu, Juan Touriño, Bertil Schmidt; MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems, Bioinformatics, Volume 32, Issue 24, 15 December 2016, Pages 3826–3828, https://doi.org/10.1093/bioinformatics/btw558is available online at: https://doi.org/10.1093/bioinformatics/btw558 [Abstracts] MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-sca…
Unified Parallel C++
2018
Abstract Although MPI is commonly used for parallel programming on distributed-memory systems, Partitioned Global Address Space (PGAS) approaches are gaining attention for programming modern multi-core CPU clusters. They feature a hybrid memory abstraction: distributed memory is viewed as a shared memory that is partitioned among nodes in order to simplify programming. In this chapter you will learn about Unified Parallel C++ (UPC++), a library-based extension of C++ that gathers the advantages of both PGAS and Object Oriented paradigms. The examples included in this chapter will help you to understand the main features of PGAS languages and how they can simplify the task of programming par…
Distributed Computing on Distributed Memory
2018
Distributed computation is formalized in several description languages for computation, as e.g. Unified Modeling Language (UML), Specification and Description Language (SDL), and Concurrent Abstract State Machines (CASM). All these languages focus on the distribution of computation, which is somewhat the same as concurrent computation. In addition, there is also the aspect of distribution of state, which is often neglected. Distribution of state is most commonly represented by communication between active agents. This paper argues that it is desirable to abstract from the communication and to consider abstract distributed state. This includes semantic handling of conflict resolution, e.g. i…